EXPLICIT ERROR BOUNDS FOR LAZY REVERSIBLE 
MARKOV CHAIN MONTE CARLO 



DANIEL RUDOLF 



Abstract. We prove explicit, i.e., non-asymptotic, error bounds for Markov 
Chain Monte Carlo methods, such as the Metropolis algorithm. The problem is to 
compute the expectation (or integral) of / with respect to a measure it which can 
be given by a density g with respect to another measure. A straight simulation of 
the desired distribution by a random number generator is in general not possible. 
Thus it is reasonable to use Markov chain sampling with a burn-in. We study such 
an algorithm and extend the analysis of Lovasz and Simonovits (1993) to obtain 
an explicit error bound. 



1. Problem description, Introduction 

The paper deals with numerical integration based on Markov chains. The main goal 
is to approximate an integral of the following form 

(1) S(f):= [ f(x)*(dx), 

Ju 

where Q is a given set and it a probability measure. In addition we assume that an 
oracle which computes function values of / is provided. We generate a Markov chain 
Xi,X 2 , . . . with transition kernel K, having n as its stationary distribution. After 
a certain burn-in time there is an average computation over the generated sample 
(Markov chain steps). For a given function / and burn-in time, say no, we get as 
approximation 



1 

S n ,n (f) '■= — ^ f(Xj +no ). 



n 
i=i 

This Markov chain Monte Carlo method (MCMC) for approximating the expecta- 
tion plays a crucial role in numerous applications, especially in statistical physics, in 
statistics, and in financial mathematics. Certain asymptotic error bounds are known, 
which can be proved via isoperimetric inequalities, the Cheeger inequality and esti- 
mates of eigenvalues, see [LS881 IMat99l lMat04j . Here in contrast, we determine an 
explicit error bound for S n>no . The individual error of such a method S nj1l0 and a 
function / is measured in mean square sense, i.e., 

e(S n , no J) := {E\S n , no (f)-S(f)\ 2 ) 1/2 . 

Now an outline of the structure of the paper and the main results is given. Section [2] 
contains the used notation and repeats some relevant statements. An introduction 
of the idea of laziness is given in Section [3J where also the conductance concept and 
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a convergence property of the chain is presented. It is useful for getting results to 
restrict ourself to Markov chains which have a positive conductance and where the 
initial distribution u, for obtaining the first time step, has a bounded density with 
respect to it. Section @] contains the new results. Let <p be the conductance of the 
underlying chain. After a burn-in 

log (KID - ,~ - - 10 



n > 



the error obeys e(S , n , jno , /) < 



This implies immediately that the number n 
for an error e, can be bounded by 



n of time steps which are needed 



MKL) 



+ 



100 



if 2 ■ e 2 



All results are in a general framework, such that after an adaption it is possible to 
apply the theory in different settings e.g. discrete state space or continuous one. In 
Section [5] we pick up a problem considered in |MN07j . There the authors use the 
Metropolis algorithm for approximating an integral over the d dimensional unit ball 
B d C M. d with respect to an unnormalised density. The strict positive density is 
notated by g and moreover we assume that it is logconcave and a is the Lipschitz 
constant of log g. Let 5 > and B(x, S) be the ball with radius 5 around x. Then we 
suggest the method described in Algorithm [T] (see page for the approximation of 



S(f) = S(f, g) 



f B d f{x)g(x)dx 



f B d g(x)dx 

It is shown that for 5 = min {l/VdTl,l/a} the error obeys 

. x . \J d + 1 max {\/d + 
e«n o; /)< 8000^ 



n 



where the burn-in time is chosen larger than 1280000 • a(d + 1) max {d + 1, a 2 }. 
It is worth pointing out that the number of time steps which we use for sampling 
behaves polynomial in the dimension and also polynomial in the Lipschitz constant 
a of the densities. As already mentioned the same integration problem was studied 
in [MN07] . The authors asked whether the problem is tractable. That means the 
number of function evaluation to obtain an error smaller than e can be polynomially 
bounded by the dimension and the Lipschitz constant. So we give a positive answer; 
the problem is tractable, at least if we consider bounded integrands /. 



2. Notation and basics 

In this section we explain the most important facts and definitions which we are going 
to use in the analysis. For introductory literature to general Markov chains we refer 
the reader to [MT93j . [Num84j or [Rev84j . Throughout this study we assume that 
(Q, A) is a measurable countably generated space. Then we call K : Q x A — > [0, 1] 
Markov kernel or transition kernel if 

(i) for each x G Q the mapping A G A i— > K(x, A) induces a probability measure 
on Q, 
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Algorithm: ^ no (/,g) 

(1) choose X\ randomly on B d ; 

(2) for i — 1, . . . , n + tiq do 

• if randi) < 1/2 then Xj + i :— Xi\ 

• else 

- choose Y E B(Xi,8) uniformly; 

- if Y £ B d then X i+1 := Xn 

-if Y e B d and g(Y) > g(Xi) then X i+1 := Y ; 

- if Y G B d and g(Y) < g(Xi) then 

- Xi+i := Y with Prob g(Y)/g(Xi) and 

- X i+1 := X % with Prob 1 - g(Y)/ g(Xi) . 

(3) Return: 

1 n 
n z — ' 

Algorithm 1: Metropolis algorithm for S(f, g) 



(ii) for each A G A the mapping x G Q i— > X(a;, A) is an ^4- measurable real 
function. 

In addition .M = (Q, ^4., {K (x, •) : x G fi}) is the associated Markov scheme. This 
notation is taken from [LS93j . A Markov chain X\, X 2 , ... is given through a Markov 
scheme M. and a start distribution v on Q. The transition kernel K(x, A) of the 
Markov chain describes the probability of getting from x G f2 to A G A in one 
step. Another important assumption is that the given distribution ir is stationary 
concerning the considered Markov chain, i.e. for all A G A 

ir{A) = [ K(x,A)n(dx). 
Jq 

Roughly spoken that means: Choosing the starting point with distribution ir, then 
after one step we have the same distribution as before. Another similar but stronger 
restriction of the chain is reversibility. A Markov scheme is reversible with respect 
to 7T if for all A, B G A 



[ K(x, A)ir{dx) = [ K(x,B)v{dx). 
Jb J a 



The next outcome is taken from [LS93] . But it is not proven there so we will give 
an idea of the proof. 

Lemma 1. Let M. be a reversible Markov scheme and let F : Q x Q — ► M be 
integrable. Then 

(2) I [ F(x,y)K(x,dy)ir(dx)= [ I F(y, x) K{x, dy)it(dx). 

JnJn JnJn 

Proof. The result is shown using a standard technique of integration theory. Since 
the Markov scheme is reversible we have 




Iaxb 

in Jn 



(x,y)K(x,dy)n(dx) = / / I A xB(y,x)K(x,dy)-K(dx) 

Jn Jn 
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for A, B G A. Having finished this we develop the equality of the integrals for an 
arbitrary set C G A ® A, where A <E> A is the product a-algebra of A with itself. 
This is an application of the Dynkin system theorem. Then we consider the case 
where / is a simple function, which is straightforward. The next step is to obtain 
the equality for positive function and after that extending the result to general 
integrable ones. □ 

Remark 1. If we have a Markov scheme, which is not necessarily reversible but has 
a stationary distribution the following holds true 

S(f)= [ f(x)7r(dx)= f [ f(y)K(x,dy)n(dx), 
Jn Jn Jn 

where / : Q — > R is integrable. This can be seen easily by using the same steps as 
in the proof of Lemma [TJ 

By K n (x, •) we denote the n-step transition probabilities and we have for x G f2, 
A G A that 

K n (x,A) := / K n -\y,A)K{x,dy)= [ K(y, A)K n -\x,dy). 
Jn Jn 

This again constitutes a transition kernel of a Markov chain sharing the invariant 
distribution and reversibility with the original one. Thus the outcomes of Lemma [T] 
and Remark [T] also hold for the n-step transition probabilities i.e. 

(3) / [ F(x,y)K n (x,dy)n(dx)= [ [ F(y,x) K n (x,dy) ir(dx). 
Jn Jn Jn Jn 

Now we define for a Markov scheme M, a nonnegative operator P : L tX) (Q, 7r) — > 
£00(^,71-) by 

(Pf)(x)= [ f(y)K(x,dy). 
Jn 

(Nonnegative means: if / > then Pf > 0.) This operator is called Markov or 
transition operator concerning a Markov scheme M. and describes the expected 
value of / after one step with the Markov chain from i6fl. The expected value of 
/ from iGll after n-steps with the Markov chain is given as 

(P n f)(x)= [ f(y)K n ( Xl dy). 
Jn 

Let us now consider P on the Hilbert space L 2 (fl, ir) and (/, g) = f n f(x)g(x) ir(dx) 
denotes the canonical scalar product. Notice that the considered function space is 
chosen according to the invariant measure. Then we have with Lemma [1] 

(4) (/, f) ± (/, Pf) = \ [ [ (/(*) ± f(y)) 2 K(x, dy)iT(dx) > 0. 

1 Jn Jn 

From a functional analysis point of view that means ||P|| i2 _ >L2 < 1. It is straight- 
forward to show that ||-P n |li p ^i p < 1 for p = 1, 2 or 00 and n G N. 
Let Xi,X 2 , ... be the result of a reversible Markov chain. The expectation of the 
chain with starting distribution v = ir and Markov kernel K from scheme Ai is 
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The assumption that the initial distribution is the stationary one makes the cal- 
culation easy. In the general case, where the starting point is chosen by a given 
probability distribution u, we obtain for i < j and functions / G L 2 (fl, tt) 



It is easy to verify with (j2J) that P is self-adjoint as acting on Z,2(f2,7r). In the 
next part we are going to get one more convenient characteristic of P under some 
additional restrictions. 



An introduction to laziness and a more detailed view on the conductance is given in 
|LS93j . Most results which we are going to mention here are taken from this reference. 
A Markov scheme M = (£l,A, {K(x, •) : x G fi}) is called lazy if K(x, {x}) > 1/2 
for all x G Q. This means the chain stays at least with probability 1/2 in the current 
state. Notice that the resulting chain from Algorithm [T] (see page [3]) is lazy because 
of line three. The crucial fact for slowing down is to deduce that the associated 
Markov operator P is positive semidefmite. Therefore we study only lazy chains. 
This is formalized in the next Lemma. 

Lemma 2. Let Ai be a lazy, reversible Markov scheme then we have for f G L 2 (£l, 7r) 



for all AeA.To verify, that K is again a transition kernel we need K(x, {x}) > 1/2. 
The reversibility condition for At holds, since scheme Ai is reversible. The Markov 
operator of Ai is given by P = (2P — I), where / is the identity. Since we established 
reversibility of the new scheme we obtain by applying Lemma [T] equality (jl]) for P. 
So it is true that 




3. Laziness and Conductance 



(6) (Pf,f)>0. 

Proof. We consider another Markov scheme Ai := (Q,A, {K(x, •) : x G where 
K(x, A) = 2K(x, A) - I(x, A) with 




x G A 
x G A c 



(f,f)<((2P -/)/,/) <(f,f). 



Now let us consider 



(Pf,f) = -(f,f) + -((2P-I)f,f)>0, 



such that the claim is proven. 



□ 



6 



DANIEL RUDOLF 



Having finished this, we can turn to the conductance of the Markov chain. For a 
Markov scheme Ai = (fl,A, {K(x, •) : x G which is not necessarily lazy, it is 
defined by 

J A K(x,A c )n(dx) 



ip(K, tt) 



inf 

0<7r(A)<l/2 



tt{A) 



where 7r is a stationary distribution. The numerator of the conductance describes 
the probability of leaving A in one step, where the starting point is chosen by 
7i. An important requirement for the following is that the scheme has a positive 
conductance, since the next result is not useful otherwise. 

Lemma 3. Let M. be a lazy, reversible Markov scheme and let v be the initial 
distribution. Furthermore we assume that the probability distribution v has a bounded 
density function ^f- with respect to tt. Then for A e A we obtain 



(7) 



K j (x,A)v(dx) -tt(A) 





dv 








oo ^ 




d7T 





1 - 



(f(K, TT 



,2\ J 



Proof. Look at the result of [LS931 Corollary 1.5, p. 372] and translate it in our 
notation. □ 

Remark 2. The left hand side of ((7j) can be transformed as follows 
K j (x,A) u{dx) -tt(A) = 




dv 



n J a 



(3) 




a Jn 




a Jn 




T~(y) K J (x,dy) Tr(dx) 
dir 

T~(y) (K 3 (x,dy) - Tr(dy)) Tr(dx) 
n dn 



K J (x,dy) — (x) Tr(dx) - tt{A) 
dir 

t~(v) ^(dy) x{dx) 

dir 



Now it is clear that with Lemma [3] for A e A 



(8) 



/ / ^(y)(K>(x,dy)-TT(dy))TT(dx) 
J a Jn 





dv 








oo ^ 




dir 





lf(K, TT 



i2\ 3 



Remark 3. Observe, that we got a bound for the speed of convergence to station- 
arity of the considered Markov chain. Once more it is possible to estimate the right 
hand side of (IHD, in detail 



(9) 

holds true. 



dv 



dir 



(f(K, TT 



,2\ 



< 



1 dv 




1 dn 


exp 

oo 



.<p(K,ir) 



To use the conductance we need a connection to the operator P. This is given in 
form of the so called Cheeger inequality. Before we are going to state this conclusion 
in a slightly different formulation we define a subset of L2(fl, tt) as follows 

L° 2 (Q,tt) 



{/ EL 2 (n,TT):S(f) = Q} 
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Lemma 4 (Cheeger's inequality). Let M. be a reversible Markov scheme with con- 
ductance (p(K, 7r). Then for g G L\ 



Remark 4. There are many other references where the convergence rate of Markov 
chains to stationarity is studied, see e.g. [JS89t IDS911 IRos95t IRR04] . One approach 
is to bound the second eigenvalue of the operator P. The relation between the 
eigenvalue and the conductance of a Markov chain is given by Cheeger's inequality 
(see Lemma H]). In this context the laziness condition shifts the spectrum of the 
Markov operator P restricted to L\ from (—1, 1) by the transformation described in 
Lemma [2] to (0, 1) i.e. the second eigenvalue is always positive. 



This section contains the main result and its proof. At first we are going to repeat an 
already known finding, which is used to show an explicit error bound for a general 
Markov scheme with initial probability distribution v. Most arguments to obtain 
that result are from [LS93] and [Mat99j . 

The next conclusion considers an algorithm under the assumption that the starting 
point is chosen according to the stationary distribution. So a preliminary burn-in 
period is not necessary anymore since we are already at the invariant distribution. 

Theorem 5. Let Ai be a lazy, reversible Markov scheme with stationary distribution 
ir, let X\,X2, ... be a Markov chain generated by M. with initial distribution tt. Let 



feL 2 (n,7r),S(f) = J n f(x)7r(dx) and S n (f) := S n , (f) = ±J2] =1 f(Xj). Then we 



(10) 




Proof. See [LS931 Corollary 1.8, p. 375]. 



□ 



4. Error bounds 



obtain 



e(S n ,f) 2 = E n , K \S(f)-S n (f)\ 2 < 



4 



11/11 



2 

2 ■ 



ip(K, 7r) 2 • n 



Remark 5. This proof is again taken from [LS931 Theorem 1.9, p. 375]. Since it is 
very important in our analysis and because of the slightly different notation we will 
repeat it. 
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Proof. Let g := f — S(f), such that g G L\. Then we have with Lemma [2], Lemma H] 
and ||g|L < ||/|L that 



^n,K \S{f) — S n (f)\ 2 = E njK 



n 

3=1 

- lb lb -« 71 lb 

j=l i = l j: 

1 (n (g, g ) + J22(n-k) (P k g,g) 



j=i i=i 



k=l 



i n-1 oo „ oo y 

4 



(^(ir,7r) 

#ll 2 



fc=0 

4 2 

; ^ 7^ 11/112 



I II 2 / 



Notice that laziness is essentially used by applying (P k g,g^ > in the second 
inequality. □ 

Let us consider the more general case, where the initial distribution is not the 
stationary one. In the next statement a relation between the error of starting with 
it and the error of starting not with the invariant distribution is established. 

Lemma 6. Let M. be a reversible Markov scheme with stationary distribution it, let 
Xi,X 2 , . . . be a Markov chain generated by A4 with initial distribution v. Let ^ be 
a bounded density of v with respect to ir. Then we get for g :— f — S(f) G L\ 

(11) E„,* \S(f) - S n , no (f)\ 2 = E WjK \S(f) - S n (f)\ 2 

+ AE/ / t {v) ( Kno+j ( x > d v)-< d y)) g{x?<dx) 

+ ^E E / / T iy) { Kno+3 ^ d v)-< d v)) 9(x)P k ^g(x)7r(dx). 
n j=l k=j+l ^ n ^ n 71 

Proof. It is easy to see, that 



E 



k \S(f) - S n , no (f)\ 2 = ^J2J2E uJ< (g(X no+J )g(X no+l )) 

n j=i t=i 

n „ _ n— 1 n „ 

p^ g ( x fu(dx) + -J2 E / P no+j (9(x)P k - j g(x))u(dx). 

71 3=1 Jfl 71 3=1 k=j+l Jn 
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For every function h G L 2 (ft,7r) and i G N under applying ([3]) the following trans- 
formation holds true 



P l h(x) v{dx) =11 h(y) K*(x, dy) ^-(x) n(dx) 
Jn Jn dn 

^(y)K i (x,dy)h(x)n(dx) 
© Jn Jn dTr 

h(x) Tr(dx) + I [ ^j-(y) dy) - n(dy)) h(x) ir(dx) 

Jn Jn 

P i h(x)ir(dx) + [ [ ^(y)(K i (x,dy)-7r(dy))h(x)7r(dx). 
© Jn Jn Jn dn 



Using this in the above setting formula (|TT|) is shown. □ 




The next finding is also a helpful tool to prove the main result of this paper. It 
modifies the convergence property, which is described in Lemma [31 such that we are 
able to use it in the considered context. 



Lemma 7. Let A4 be a lazy, reversible Markov scheme with stationary distribution 
TV, let v be the initial distribution with bounded density ^ of the related Markov 
chain. Then we obtain for h G L oc (fi, ir) and j G N 



/ / -r(v) { K3 ( x 'dy) - n{dy))h(x)Ti{dx) 
Jn Jn 

Proof. At first we define Pj(x) := f n ^r-(y) (K^(x,dy) — n(dy)). With the standard 
proof technique of integration theory it is easy to see that the measurability of the 
density and the kernel can be carried over to pj. Now we consider the positive and 
negative parts of the functions h and pj. To formalize this we use 



<4||/i| 



du 
dn 



(p(K, 7f) 



ft+ 


:= {x 


G ft 


Pj 


ft! 


:= {x 


G ft 


Pj 


ft; 


:= {x 


G ft 


Pj 


ft: 


:= {x 


G ft 


Pj 



(x) > 0, h(x) > 0} , 

(x) > 0, h(x) < 0} , 

(x) < 0, h(x) > 0} , 

(x) < 0, h(x) < 0}. 
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These subsets of Q are all included in the u-algebra A, since Pj and h are measurable 
functions. So applying (JSJ) leads to the following upper bound 



/ 



Pj(x)h(x) ii(dx) 



< 


In 







Pj(x)h(x) ii(dx] 



Pj(x)h(x) 7r(dx] 



Pj(x)h(x) n(dx] 



Pj(x)h(x) ii(dx) 



< \\h\L 

II II oo 


/ pj{x)Tt{dx) 




/ pj{x)^{dx) 




JQ.+ 




Jn± 



+ \\h\L 


/ pj(x)ir(dx) 


+ \\h\L 


/ Pj(x)ir(dx) 








Jnz 



< 4 



du 



dn 



i2\ J 



□ 



Now all results are available to obtain our main error bound for the MCMC method 

Theorem 8. Let Xi,X 2 , ... be a lazy, reversible Markov chain, defined by the 
scheme Ai and the initial distribution u. Let the initial distribution have a bounded 
density ^ with respect to ir. Let S n)no (f) = ^ Y^Jj=i f(X n o+j) ^ e the approximation 
°f S(f) = f n f(x)ii(dx) , where f G L 00 (Q,ii). Then 



e (Sn,n , f) < 



2W1 + 24 



— 1 1 

div 1 1 oo 



exp 



-n — 2 — 



(p(K, 7r) ■ \Jn 

Proof. By Lemma [6] and Lemma [7] where g :— f — S(f) we have 

|2 ^ n^f\ n ^m2 



K, K \s(f) - w/)r < v.,k \s(f) - s n (f)[ 

|2 n 

E 



4 Hal 



+ 



8 b 



l 2 n— 1 n 



dn 



n- 



E E II** 



j=i k=j+i 
For an easier notation we define 



du 



d 



71 



<£>(if, 7r) 



2 \ i+™o 



(12) 



£ := 



/ dz/ 




/ dn 


exp 

oo 



-wo- 



^(i^vr) 
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Taking (Q and (fT2l) into account the following transformation is true 

1 2 n / / \2\ i 



or < - 5 n (/)i 2 + 4£o Jf llo ° e ( 1 



+ 



8g \\g 
n 2 



1 2 re— 1 n 



E E ir 



i2\ 3 



With the geometric series and \\P l 



< 1 for all i G N we get 



1 2 n— 1 



8^o \\g\ 



tp(K, 7r) 2 • n 2 



8^o \\g\ 



n- 



i2\ J 



<E^|5(/)-5 n (/)| 2 + 



8eolM 



<E« tK \S{f)-S n {f)f 



ip(K, it) 2 ■ n <p(K, it) 2 ■ n 2 
24e |MlL 



After applying Theorem [5] and using 
proven. 



< 



> II^IL < 4 ll/lloo everything is 

□ 



The major difference between the new error bound of Theorem [8] and the already 
known from Theorem [5] is that the unrealistic assumption to sample from the sta- 
tionary distribution n for the first time step is weakened. It came out that for a 
certain burn-in time no a very similar upper bound holds true, if the initial distri- 
bution v has a bounded density with respect to it. A further estimation yields the 
next conclusion. 

Corollary 9. Let X±,X2, ... be a lazy, reversible Markov chain. The initial distri- 
bution v has a bounded density 4^ with respect to ix . Then for f e L tX3 (n,7r) and 



S n ,n (f) = 4 E"=i f( X J+n ) after a burn-in 



(13) n > 



V (K,tt) 2 



the error obeys e(5 n>no ,/) < 



10 



(f(K, 7r) 



ll/l 



If we denote by cost(J, e) the number n + no of time steps that are needed for an 
optimal algorithm to solve within an error e, then we can also write 



cost(/,e) < 



(KID 

<P(K,7T) 2 



+ 



100 



2 

oo 



<p(K,7r) 2 -e 2 



Roughly spoken that means if we control the conductance of the underlying Markov 
chain, then we also control the error. So we should look for lower bounds of the 
conductance to obtain upper estimations of the error. 

The next task is to apply the received results for an explicit example where we can 
use (113]). 
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5. Application 

For working with the above presented theory we need a lazy and reversible Markov 
chain. In the following a construction for a reversible Markov scheme, using the Me- 
tropolis algorithm, is provided. After having this scheme we make it lazy and carry 
the conductance properties over to the new chain. This laziness is easily obtained 
by pasting a coin tossing step, where we accept the new state when head occurs and 
otherwise we stay at the current one. 

Now a brief introduction to the already mentioned Metropolis algorithm is given, 
for details see |RR04] or [MN07] . Let C W 1 be a convex body and let M = 
(Q,£(Q), {Q(x, •) : x G Q}) be a reversible Markov scheme with respect to a distri- 
bution /i. With £(Q) we denote the Lebesgue a-algebra of Q and Q(x, A) is the 
transition kernel. The aim is to simulate a distribution fi e on the measurable space 
(Q,£(Q)), which is defined by an unnormalised density g such that 

(14) /i (A) = . 

f n g(x) fi(dx) 

It is required that we have an oracle for the evaluation of g. In this setup a Metropolis 
step works as described in Algorithm The procedure rand() returns a uniformly 



X n+1 = metro_step(X n , £>(•), Q(X n , ■)) 



(1) choose Y from Q(X n ,-); 

(2) calculate 7:= g(Y)/g(X n ); 

(3) if 7 > rand() then 

return Y; 

(4) else return X n . 



Algorithm 2: Metropolis step from X n to X n+ \ 

distributed random number between zero and one. If we choose a starting point Xq 
from a known distribution and take this as input in the method, then we obtain, 
after repeating Algorithm^ a Markov chain on Q. The corresponding Markov kernel 
is defined by 

(15) K e (x, A) := / 9(x, y) Q{x, dy) + I{x, A) ( 1 - / 9(x, y) Q{x, dy) 
where 



I(x,A) = { 1 q H A Ac and ^ )2/ ):=min. 



The next implication confirms that the resulting Markov scheme M. Q = (Q,£(Q), 
{K g (x, ■) : x G Q}) is reversible concerning \i e . 

Lemma 10. If the proposal Markov scheme M. of the Metropolis Hastings method 
is reversible with respect to a distribution \x, then the reversibility condition holds 
also for A4 g with respect to \x e . 

Proof. It is enough to show that the identity 



K e (x, B) fi g (dx) = / K g (x, A) fx Q (dx) 
Jb 
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for disjoint sets A, B G C(Q) is true. Furthermore 9(y,x)g(y) = 9(x,y)g(x) for 
x, y G Q and we define k := J n g(x) fi(dx). Hence this implies 

K Q (x,B) fi e (dx) = / / 9(x,y) Q(x,dy) fi e (dx) 
C3 J a Jb 

1 



{H} k 
1 

~ k 
1 

H k 
1 




A J B 




n Jn 



9{x,y)g(x) Q(x,dy)n(dx) 
Xa(x)xb(v) 9(x, y)g(x) Q(x, dy) /i(dx) 
Xa(v)xb(x) 9(y, x)g(y) Q(x, dy) fi(dx) 

n Jn 

I / 9(x,y)g(x) Q(x,dy) n(dx) = / K g (x,A)n e (dx). 
b J a Jb 




□ 



Summarizing we have until now a reversible Markov chain on the state space Q. To 
apply the theory as developed in Section @] the laziness property must be fulfilled. 
But as already mentioned we just have to flip a coin and stay at the current state 
with probability 1/2. Otherwise do one step with the chain. Formalized written down 
we consider M e = {K e { x, •) :i6(l}), where 

K e (x,A):=^K g (x,A) + ±I(x,A). 

This Markov scheme is lazy, reversible and if it is possible to get a lower bound of 
the conductance we can apply Theorem [SJ Therefore the following result is helpful. 

Lemma 11. Let M. = (Q,A, {K(x,A) : x G Q}) be an arbitrary reversible Markov 
scheme concerning it. The conductance of M. = (ft, A, {K(x, A) : x G fl}), where 
K(x,A) = 7}K(x, A) + \l(x, A) is bounded from below, i.e. 

<p(K,n) > ^(-^ 7r )- 

Proof. The result is obvious after taking the definition of the conductance into ac- 
count. □ 

Remark 6. We turned from the Metropolis chain to the lazy one. Another way 
would be to "lazify" the proposal chain and after that turn to the Metropolis one. 
This is equivalent since 



Kf(x, A) = J 9(x, y) Qq(x, dy) + h(x, dy) 
+ I(x,A) (l-^ 
= \J 0{x, y)Q(x, dy) + ~I(a;, ^)(l-J 



(\ 1 

o( x > y) [ 2 ( 5( x ' d v) + 2 J ( x ' dy ) 



9(x,y)Q(x,dy) ) + -I(x,A) 



^K?(x,A) + ~I(x,A). 
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5.1. Metropolis algorithm based on the ball walk. We come to a concrete 
given proposal Markov chain, which is defined by a 5 ball walk on the convex body Q. 
This random walk is the same like the already studied one in |MN07j and in different 
references of volume computation see e.g. |LS93[ IVem05l IVem02j . The corresponding 
Markov scheme is Ais = {Qs(x, •) : x G f2}), where 

Qs{x , A) := + L_ *W I(x , A) . 



vo\(5B d ) V vol (55 

There B(x,5) denotes the ball of radius S around x G fl and 5B d := B(0,5). We 
choose 5 < D, where D is the diameter of Q. It is easily seen that A4s is reversible 
concerning the uniform distribution on Q. By taking this ball walk as proposal kernel 
for the Metropolis algorithm we get M. Qt s = (^, £(f2), {Kq,s( x i ■) '■ x E f2}), where 

K e , s (x,A) := I 6{x,y)Q s {x,dy) + I{x,A) (l - / 6(x, y) Q s (x, dy) 

In |MN07j the authors showed that the conductance of the resulting chain is positive 
if the density is logconcave and log-Lipschitz. Therefore we consider 

7Z a (Q) :— {g : g > 0, log g concave, | log g(x) — log g(y)\ < a \\x — y\\ 2 }- 

Some more general distributions are studied in |MR02] and [GK07] . Moreover, let 
Q be the ci-dimensional unit ball notated by B d a handy lower bound of the con- 
ductance exists. Thus we can use 

Lemma 12. Let the Markov scheme M Q) s = (B d ,C(B d ), {K 8i5 (x, ■) : x G B d )) be 
the Metropolis chain based on the local ball walk M.$, where g G 7Z a (B d ). Then we 
obtain for an adapted 5 = min {l/\/d + 1, 1/a} the following lower bound of the 
conductance 

(16) Lp(K eiS , Ve) > 0.0025^^=min 



Proof. See |MN07l Corollary 1]. □ 

The geometry of the unit ball is essentially used, since the ball walk would get stuck 
with high probability in domains which have corners. 

Having finished this we obtain an explicit error bound of the Markov chain Monte 
Carlo method on Q := B d for a function class J ra (B d ). This class is defined by 

r a (n) :={(f,g):gen a (nUf\L<i}. 

The method, based on a certain 5 ball walk after a burn- in time no, is presented 
in Algorithm [U where S s n>no {f,g) = ± £"=i f(X j+no ) if (f,g) G T a {B d ). At first 
we should care about the starting point in B d . The simplest way to handle this is 
choosing the initial state concerning the uniform distribution on the state space B d . 
So the following calculation for v, where A G C(B d ) holds true 

vol(A) 1 f f g(y) , 

vo\(B d ) vo\(B d ) J A J Bd g{x) 
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This implies that for g e Tl a {B d ) 

dv 



d\i k 



< exp(2a). 

oo 

Now let us turn our view to the error of this Markov Chain Monte Carlo method 
and summarize the previous outcomes. 



Theorem 13. Let Xi,X 2 , ... be the lazy Metropolis Markov chain which is based 
on a 5 ball walk, where 5 = min {l/Vd + T, I /a}. Furth ermore it is required that 
(f,e) e T a {B d ). Then we get 



c\ ^ onnn ^ + 1 max {Vd+ 1, a} 



e(^ no ,/)< 8000 
where n > 1280000 ■ a(d + 1) max {d + 1, a 2 } 



, n 

21 



Proof. After the consideration for the initial distribution the lower bound (1161) 
for the conductance and applying Lemma EJ Lemma [12] and ( fl3l) the claim is 
proven. □ 

For an interpretation let us consider the cost of the underlying method. With The- 
orem [13] we have 



cost(/, e) < [1280000 • a(d + 1) max {d + 1, a 2 }] 

+ [64000000 • {d + 1) max {d + 1, a 2 } e~ 2 ] . 

This shows that the cost depends only polynomial on the dimension and the Lip- 
schitz constant such that the suggested algorithm S n , no avoids the curse of dimension. 
In this setting it is worth to mention that the number of time steps n + Uq is 
proportional to the number of function evaluations of / and g. We need at most 
n + no oracle calls for g and n for /. 
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